Error analysis of a public domain pronunciation dictionary

نویسندگان

  • Olga Martirosian
  • Marelie Davel
چکیده

We explore pattern recognition techniques for verifying the correctness of a pronunciation lexicon, focusing on techniques that require limited human interaction. We evaluate the British English Example Pronunciation (BEEP) dictionary [1], a popular public domain resource that is widely used in English speech processing systems. The techniques being investigated are applied to the lexicon and the results of each step are illustrated using sample entries. We find that as many as 5553 words in the BEEP dictionary are incorrect. We demonstrate the effect of correction techniques on a lexicon and implement the lexicon in an automatic speech recognition (ASR) system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhanced tree clustering with single pronunciation dictionary for conversational speech recognition

Modeling pronunciation variation is key for recognizing conversational speech. Rather than being limited to dictionary modeling, we argue that triphone clustering is an integral part of pronunciation modeling. We propose a new approach called enhanced tree clustering. This approach, in contrast to traditional decision tree based state tying, allows parameter sharing across phonemes. We show tha...

متن کامل

Kanji-to-Hiragana Conversion Based on a Length-Constrained -Gram Analysis

A common problem in speech processing is the conversion of the written form of a language to a set of phonetic symbols representing the pronunciation. In this paper, we focus on an aspect of this problem specific to the Japanese language. Written Japanese consists of a mixture of three types of symbols: kanji, hiragana, and katakana. We describe an algorithm for converting conventional Japanese...

متن کامل

Automatic Error Recovery for Pronunciation Dictionaries

In this paper, we present our latest investigations on pronunciation modeling and its impact on ASR. We propose completely automatic methods to detect, remove, and substitute inconsistent or flawed entries in pronunciation dictionaries. The experiments were conducted on different tasks, namely (1) word-pronunciation pairs from the Czech, English, French, German, Polish, and Spanish Wiktionary [...

متن کامل

Analysis of phonetic transcriptions for Danish automatic speech recognition

Automatic speech recognition (ASR) relies on three resources: audio, orthographic transcriptions and a pronunciation dictionary. The dictionary or lexicon maps orthographic words to sequences of phones or phonemes that represent the pronunciation of the corresponding word. The quality of a speech recognition system depends heavily on the dictionary and the transcriptions therein. This paper pre...

متن کامل

Implicit Pronunciation Modelling in Asr

Modelling of pronunciation variability is an important part of the acoustic model of a speech recognition system. Good pronunciation models contribute to the robustness and portability of a speech recogniser. Usually pronunciation modelling is associated with the recognition lexicon which allows a direct control of HMM selection. However, in state-of-the-art systems the use of clustering techni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007